AITopics | sample-efficient deep reinforcement learning

Collaborating Authors

sample-efficient deep reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Neural Information Processing SystemsDec-26-2025, 02:31:23 GMT

We propose Episodic Backward Update (EBU) - a novel deep reinforcement learning algorithm with a direct value propagation.

episodic backward update, name change, sample-efficient deep reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Neural Information Processing SystemsJan-27-2025, 19:20:24 GMT

The paper proposes to use episodic backwards updates to improve data efficiency in RL tasks, furthermore they introduce a soft relaxation of this in order to combat the overestimation that typically comes from using backwards updates when using Neural Network models. Overall the paper is very clearly written. My main concerns with the paper are in the experimental details as well as in the literature review, also when taking into account the existing literature the novelty of the work is quite limited. The idea of using backwards updates is quite old and goes back to at least the 1993 paper "Prioritized Sweeping" by Moore and Atkeson, which in fact demonstrates a method that is very similar to what the authors propose and which the authors fail to cite. Furthermore recently there were quite a few papers operating in a similar space of ideas using a backward view in ways similar to the authors, e.g.: Fast deep reinforcement learning using online adjustments from the past, https://arxiv.org/abs/1810.08163

deep reinforcement learning, main concern, sample-efficient deep reinforcement learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Neural Information Processing SystemsJan-27-2025, 19:20:13 GMT

All reviewers recommend accepting the paper. The authors response did address most of the reviewers' concerns. While the AC recommends accepting the paper, the AC encourages the authors to consider the comments of reviewer 1. Only changing the backup mechanism keeping all other hyper parameters fixed as in the Nature DQN model is indeed a good experimental setup. However, the optimal operation mode for different models might be different (even when sharing architectures and training protocols): for instance we could'afford' a larger learning rate if we have a better back-up mechanism.

episodic backward update, review, sample-efficient deep reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Neural Information Processing SystemsOct-11-2024, 04:09:16 GMT

We propose Episodic Backward Update (EBU) – a novel deep reinforcement learning algorithm with a direct value propagation. Our computationally efficient recursive algorithm allows sparse and delayed rewards to propagate directly through all transitions of the sampled episode. We theoretically prove the convergence of the EBU method and experimentally demonstrate its performance in both deterministic and stochastic environments. Especially in 49 games of Atari 2600 domain, EBU achieves the same mean and median human normalized performance of DQN by using only 5% and 10% of samples, respectively.

episodic backward update, sample-efficient deep reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Lee, Su Young, Sungik, Choi, Chung, Sae-Young

Neural Information Processing SystemsMar-18-2020, 21:16:16 GMT

episodic backward update, sample-efficient deep reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sample-efficient Deep Reinforcement Learning for Dialog Control

Asadi, Kavosh, Williams, Jason D.

arXiv.org Machine LearningDec-18-2016

Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL. The key idea is to maintain a second RNN which predicts the value of the current policy, and to apply experience replay to both networks. On two tasks, these methods reduce the number of dialogs/episodes required by about a third, vs. standard policy gradient methods.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1612.06

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback